Diachronic Analysis of the Italian Language exploiting Google Ngram

نویسندگان

  • Pierpaolo Basile
  • Annalina Caputo
  • Roberta Luisi
  • Giovanni Semeraro
چکیده

English. In this paper, we propose several methods for the diachronic analysis of the Italian language. We build several models by exploiting Temporal Random Indexing and the Google Ngram dataset for the Italian language. Each proposed method is evaluated on the ability to automatically identify meaning shift over time. To this end, we introduce a new dataset built by looking at the etymological information reported in some dictionaries. Italiano. In questo lavoro proponiamo alcuni metodi per l’analisi diacronica della lingua italiana. Abbiamo costruito differenti modelli utilizzando la tecnica del Temporal Random Indexing e Google Ngram per l’italiano. Ciascun metodo proposto è stato valutato rispetto alla capacità di identificare automaticamente i cambi di significato nel tempo. A tale scopo introduciamo uno nuovo dataset costruito mediante le informazioni etimologiche presenti in alcuni dizionari. 1 Motivation and Background Languages can be studied from two different and complementary viewpoints: the diachronic perspective considers the evolution of a language over time, while the synchronic perspective describes the language rules at a specific point of time without taking its history into account (De Saussure, 1983). In this work, we focus on the diachronic approach, since language appears to be unquestionably immersed in the temporal dimension. Language is subject to a constant evolution driven by the need to reflect the continuous changes of the world. The evolution of word meanings has been studied for several centuries, but this kind of investigation has been limited by the low amount of data on which to perform the analysis. Moreover, in order to reveal structural changes in word meanings, this analysis has to explore long periods

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamics of core of language vocabulary

Studies of the overall structure of vocabulary and its dynamics became possible due to creation of diachronic text corpora, especially Google Books Ngram. This article discusses the question of core change rate and the degree to which the core words cover the texts. Different periods of the last three centuries and six main European languages presented in Google Books Ngram are compared. The ma...

متن کامل

A fully data-driven method to identify (correlated) changes in diachronic corpora

In this paper, a method for measuring synchronic corpus (dis-)similarity put forward by Kilgarriff (2001) is adapted and extended to identify trends and correlated changes in diachronic text data, using the Corpus of Historical American English (Davies 2010a) and the Google Ngram Corpora (Michel et al. 2010a). This paper shows that this fully data-driven method, which extracts word types that h...

متن کامل

The Fairly Good Economy: Testing The Economization Of Society Hypothesis Against A Google Ngram View Of Trends In Functional Differentiation (1800-2000)

The present article considers the economization of society a hypothesis rather than a fact. The hypothesis is tested against the results of a Google ngram viewer analysis of the most frequent function system references in the Google Books corpus for the years 1800-2000. Despite the remarkable growth figures in the English, French, and German language corpora as related to economic word frequenc...

متن کامل

The Unpopular Function Testing the Economization Hypothesis against a Google Ngram View of Trends in Functional Differentiation (1800-2000)

The present article considers the economization of society a hypothesis rather than a fact. The hypothesis is tested against the results of a Google ngram viewer analysis of the most frequent function system references in the Google Books corpus for the year 1800 through 2000. Despite the remarkable growth figures the economic word frequency shares feature in the English, French, and German lan...

متن کامل

Peachnote: Music Score Search and Analysis Platform

Hundreds of thousands of music scores are being digitized by libraries all over the world. In contrast to books, they generally remain inaccessible for content-based retrieval and algorithmic analysis. There is no analogue to Google Books for music scores, and there exist no large corpora of symbolic music data that would empower musicology in the way large text corpora are empowering computati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016